***************************
***
**HOW DOES THE EDUCATIONAL CLEAVAGE STACK UP AGAINST THE CLASSIC CLEAVAGES OF THE PAST?
** Liesbet Hooghe and Gary Marks, University of North Carolina at Chapel Hill & European University Institute, Florence

** West European Politics

** submitted Aug 4, 2024, accepted Dec 17, 2024.
***
****************************

***************************************************

**Social Structuration on the Education Cleavage (FIGURE 1-3): P for education, occupation, gender in western europe, 1975-2020

***************************************************

use "EBESSnonvotersmerge_dec24.dta"

*Figure 1: education
twoway (qfitci higher_dif100 year if (TAN==1) & EU9==1 & max_n>30 & higher_dif100!=. , ciplot(rline) clcolor(stc8) clwidth(thick) alcolor(stc8) alwidth(vthin)) (qfitci higher_dif100 year if (GAL==1) & EU9==1 & max_n>30 & higher_dif100!=., ciplot(rline) clcolor(green) clwidth(thick) alcolor(green) alwidth(vthin)) (qfit higher_dif100 year if (family==16) & EU9==1 & max_n>30 & max_n!=. & higher_dif100!=. & year>1975 , lcolor(gs10) lpattern(longdash) lwidth(thick)) (qfit higher_dif100 year if (LEFT==1) & EU9==1 & max_n>30 & higher_dif100!=., lcolor(gs6) lwidth(thick)) (qfit higher_dif100 year if (RIGHT==1) & EU9==1 & max_n>30 & higher_dif!=., lcolor(gs12) lwidth(thick)), ylabel(-20(5)20) xtitle(1975-2020) xlabel(1975(5)2020) yline(0, lcolor(black)) title("Higher educated") legend(size(small) position(3)) 

*Figure 2: occupation
twoway (qfitci occ1_dif100 year if (TAN==1) & EU9==1 & max_n>30 & occ1_dif100!=. , ciplot(rline) clcolor(stc8) clwidth(thick) alcolor(stc8) alwidth(vthin)) (qfitci occ1_dif100 year if (GAL==1) & EU9==1 & max_n>30 & occ1_dif100!=., ciplot(rline) clcolor(green) clwidth(thick) alcolor(green) alwidth(vthin)) (qfit occ1_dif100 year if (family==16) & EU9==1 & max_n>30 & max_n!=. & occ1_dif100!=. , lcolor(gs10) lpattern(longdash) lwidth(thick)) (qfit occ1_dif100 year if (LEFT==1) & EU9==1 & max_n>30 & occ1_dif100!=., lcolor(gs6) lwidth(thick)) (qfit occ1_dif100 year if (RIGHT==1) & EU9==1 & max_n>30 & occ1_dif100!=., lcolor(gs12) lwidth(thick)), ylabel(-10(5)15) xlabel(1975(5)2020) yline(0, lcolor(black)) title("Workers") legend(size(small) position(3))

*Figure 3: gender
twoway (qfitci female_dif100 year if (TAN==1) & EU9==1 & max_n>30 & female_dif100!=. & year>1974, ciplot(rline) clcolor(stc8) clwidth(thick) alcolor(stc8) alwidth(vthin)) (qfitci female_dif100 year if (GAL==1) & EU9==1 & max_n>30 & female_dif100!=. & year>1974, ciplot(rline) clcolor(green) clwidth(thick) alcolor(green) alwidth(vthin)) (qfit female_dif100 year if (family==16) & EU9==1 & max_n>30 & max_n!=. & female_dif100!=. & year>1974, lcolor(gs10) lpattern(longdash) lwidth(thick)) (qfit female_dif100 year if (LEFT==1) & EU9==1 & max_n>30 & female_dif100!=. & year>1974, lcolor(gs6) lwidth(thick)) (qfit female_dif100 year if (RIGHT==1) & EU9==1 & max_n>30 & female_dif100!=.& year>1974 , lcolor(gs12) lwidth(thick)), ylabel(-20(4)8) xlabel(1975(5)2020) yline(0, lcolor(black)) title("Women") legend(size(small) position(3)) 


*TEXT
*stats under figure 1*
tabstat max_n if EU9==1 & family!=17, stats (sum)
tabstat party_id if EU9==1 & family!=17 & max_n>30, stats (N)


******************************************

**Classic benchmarks (TABLES 1-3): Germany, Norway, Britain

******************************************
 
*Table 1: P and its components for workers in the German SPD in 1957 and 2020 
use "germany 1957 working.dta"
*download original from GESIS Data Archive, Cologne. ZA3272 Data file Version 2.0.0, https://doi.org/10.4232/1.11991
gen occupation=v19
tab occupation vote57 if occupation!=9 & v2==3, row column 

use "ESS_master_12_18_2024.dta" 
*download original from ESS data portal:  https://www.europeansocialsurvey.org/data-portal
tab oesch8 party_id if essround==10 & country==3 & family!=17 & oesch8!=., column row missing // cut out the Dont knows
tab oesch8 party if essround==10 & country==3 & family!=17 & oesch8!=., column row missing // party labels

*Table 2:  P and its components for workers in the Norwegian Labor Party (AP) in 1957 and 2020
use "Norway1957working.dta"
*download original from SIKT - Kunnskapssektorens tjenesteleverandør | Norwegian Agency for Shared Services in Education and Research - https://sikt.no/kontakt-oss
tab v245 v019 if v019!=0, column row missing	

use "ESS_master_12_18_2024.dta"  
tab oesch8 party_id if essround==10 & country==35 & family!=17 & oesch8!=., column row missing
tab oesch8 party if essround==10 & country==35 & family!=17 & oesch8!=., column row missing // party labels

 
* Table 3: P and its components for workers in the British Labour Party in 1964 and 2020
use "BNES working.dta"
*download original from ICPSR: https://www.icpsr.umich.edu/web/ICPSR/studies/7233

tab occup64HH vote64 if occup64HH!=9 & (vote64!=0| v356==5), column row 8

use "ESS_master_12_18_2024.dta"  
tab oesch8 party_id if essround==10 & country==11 & family!=17 & oesch8!=. , column row missing
tab oesch8 party if essround==10 & country==11 & family!=17 & oesch8!=. , column row missing

	

******************************************

**GAL and TAN parties in time (TABLES 4-5): six European countries

*******************************************

**OUTPUT EXCEL FILES -- USE SUMMARY SHEET 
*a) "TAN P breakdown using survey data closed system aug 2024.xlsx"
*b) "GAL P breakdown using survey data closed system aug 2024.xlsx"

**INPUT:

*A) EARLY DECADE
use "EBtrend_dec2024withnonvoters.dta" 

drop if voteint==. | age<21 | led==. | occ1==.

*Germany TAN
tab led family3 if country==3 & year>=1995 & year<2001, row column

*NL TAN
tab led family3 if country==10 & year>=1995 & year<2001, row column

*Belgium TAN
tab led family3 if country==1 & year>=1995 & year<2001, row column

*France TAN
tab led family3 if country==6 & year>=1995 & year<2001, row column

*Austria TAN
tab led family3 if country==13 & year>=1995 & year<2001, row column

*DK TAN
*definition of higher/lower education is too loose for Denmark where many finish high school at 20 or 21 year old
tab educ if  year>1994 & year<=2000
tab educ country if year>1994 & year<=2000, column nofreq
gen led2=led
replace led2=1 if educ==7| educ==8
gen higher2=higher
replace higher2=0 if educ==7| educ==8

tab led2 family3 if country==2 & year>=1995 & year<2001, row column
tab occ1 family3 if country==2 & year>=1995 & year<2001, row column

***GAL****
use "EBtrend_dec2024withnonvoters.dta" 

drop if voteint==. | age<21 | higher==. 

*Germany GAL
tab higher family3 if country==3 & year>=1990 & year<1995, row column

*NL GAL
tab higher family3 if country==10 & year>=1990 & year<1995, row column

*AUS GAL
tab higher family3 if country==13 & year>=1995 & year<2001, row column

*BE GAL
tab higher family3 if country==1 & year>=1990 & year<1995, row column

*FR GAL
tab higher family3 if country==6 & year>=1990 & year<1995, row column

*DK GAL
tab higher2 family3 if country==2 & year>=1990 & year<1995, row column

*B) LATEST DECADE

use  "ESS_master_12_18_2024.dta", replace

drop if age<21| occ1==.| led==.
drop if vote==3
drop if country>36

tab family3 if country==3, missing

*Germany TAN
tab led family3 if country==3 & year>=2016 & year<2021, row column

*NL TAN
tab led family3 if country==10 & year>=2016 & year<2021, row column

*BE TAN
tab led family3 if country==1 & year>=2016 & year<2021, row column

*France TAN
tab led family3 if country==6 & year>=2016 & year<2021, row column

*Austria TAN
tab led family3 if country==13 & year>=2016 & year<2021, row column

*DK TAN
tab led2 family3 if country==2 & year>=2016 & year<2021, row column

*Germany TAN
tab led family3 if country==3 & year>=2016 & year<2021, row column

/*GAL*/ 
drop if age<21| occ4==.| higher==.
drop if vote==3
drop if country>36

*Germany GAL
tab higher family3 if country==3 & year>=2016 & year<2021, row column

*NL GAL
tab higher family3 if country==10 & year>=2016 & year<2021, row column

*BE GAL
tab higher family3 if country==1 & year>=2016 & year<2021, row column

*France GAL
tab higher family3 if country==6 & year>=2016 & year<2021, row column

*Austria GAL
tab higher family3 if country==13 & year>=2016 & year<2021, row column

*DK TAN
tab higher family3 if country==2 & (year==2014| year==2018), row column


**********************************************

**American Exceptionalism and the Education Cleavage: FIGURES 4-5, TABLES 6-7

**********************************************
*download Time Series Cumulative Data File (1948-2020) from https://electionstudies.org/data-center/

*A. create P- values 

use: "anes_timeseries_cdf_stata_20220916.dta"

tab VCF0004 // survey year

gen vote=VCF0736 // did the person vote in House elections
recode vote (0=0)
recode vote (5=2)
recode vote (7=3)
label define vote 0 "Non-voting" 1 "Democrat" 2 "Republican" 3 "Other", replace
label values vote vote
label variable vote "party voted in House elections"

tab VCF0110 // level of education
gen lowed2=1 if VCF0110<4 // some college or less
replace lowed2=0 if VCF0110==4 // full college
label define lowed2 1 "1.at most some college" 0 "0.full college or more", replace
label values lowed2 lowed2
label variable lowed2 "lower educated (narrow def)" 
gen higher2=0 if VCF0110<4 // less than full college
replace higher2=1 if VCF0110==4 // full college
label define higher2 0 "0.at most some college" 1 "1.full college or more", replace
label values higher2 higher2
label variable higher2 "higher educated (narrow def)"

tab VCF0104 // 3-cat gender (2=female | 3 = other)
gen female=1 if VCF0104==2
replace female=0 if VCF0104==1
label define female 0 "male" 1 "female", replace
label values female female

tab VCF0105b // 4-category ethnicity/race
gen white=1 if VCF0105b==1
replace white=0 if VCF0105b>1 & VCF0105b<5
label define white 0 "other" 1 "white, non-Hispanic", replace
label values white white
label variable white "white-nonhispanic"

replace lowedwhite2=1 if lowed2==1 & white==1
replace lowedwhite2=0 if lowedwhite==.
label define lowedwhite2 1 "1.loweduced(broad) & white" 0 "0.other: full college or nonwhite", replace
label values lowedwhite2 lowedwhite2
label variable lowedwhite2 "low-educated whites (some college=lowed)"
tab lowedwhite2 lowed2

gen femalehigh2=1 if female==1 & higher2==1
replace femalehigh2=0 if femalehigh2==. & (female==0 | (female==1 & higher2==0))
label define femalehigh2 1 "1.higher-ed female" 0 "0.other", replace
label values femalehigh2 femalehigh2
label variable femalehigh2 "higher educated female (narrow)"

save "anes working dec2024.dta"

foreach var of varlist female lowed2 higher2 white lowedwhite2 femalehigh2  { 
	bys VCF0004: egen mean_`var'=mean(`var') 	
	}
	
bys vote VCF0004: gen np=_n	
bys vote VCF0004: egen max_np=max(np) // max_n captures total number of observations (respondents) within party year   

collapse (mean) weight max_np year  ///
	female lowed2 higher2 white lowedwhite2 femalehigh2 ///
	mean_female mean_lowed2 mean_higher2 mean_white mean_lowedwhite2 mean_femalehigh2, by (vote VCF0004)

foreach var of varlist female lowed2 higher2 white lowedwhite2 femalehigh2 { 
	gen `var'_dif=`var'-mean_`var'
	gen `var'_dif_abs=abs(`var'_dif) // take abs value of dif 
	}
	
label variable female_dif "P for female"
label variable lowed2_dif "P for lower educated (broad)" 
label variable higher2_dif "P for higher-educated (narrow)"
label variable white_dif "P for white non-hispanic"
label variable lowedwhite2_dif "P for lower-ed whites (broad)"
label variable femalehigh2_dif "P for higher-ed females (narrow)"	
	

gen lowedwhite2_dif100=lowedwhite2_dif*100    // lowed2=some college or less
gen femalehigh2_dif100=femalehigh2_dif*100    // high2=four-year college or more
label variable femalehigh2_dif100 "Higher educated females (narrow)"
label variable lowedwhite2_dif "Lower-educated whites (broad)"

save "ANES Pvalues dec24.dta"	

*B. CREATE FIGURE 4 

use "ANES Pvalues dec2024.dta"	

*FIGURE 4: P for education and ethnicity in the United States, 1952-2020

twoway (scatter lowedwhite2_dif100 VCF0004 if vote==1, ms(0) msize(tiny) mcolor(blue)) (qfitci lowedwhite2_dif100 VCF0004 if vote==1 , clcolor(blue%90) clwidth(medthick) ciplot(rline) blwidth(vthin) blpattern(dash) lcolor(blue%50)) (scatter lowedwhite2_dif100 VCF0004 if vote==2, msize(tiny) mcolor(red)) (qfitci lowedwhite2_dif100 VCF0004 if vote==2, clcolor(red%90) clwidth(medthick) ciplot(rline) blwidth(vthin) blpattern(dash) lcolor(red%50)) (scatter lowedwhite2_dif100 VCF0004 if vote==0, msize(tiny) mcolor(gs8)) (qfitci lowedwhite2_dif100 VCF0004 if vote==0, clcolor(gs8%90) clwidth(medthick) ciplot(rline) blwidth(vthin) blpattern(dash) lcolor(gs8%50)), ylabel(-14(02)18, labsize(small)) xlabel(1948(4)2020, labsize(small) angle(forty_five)) yline(0, lcolor(black)) legend(on) clegend(title("Lower educated whites"))  

*FIGURE 5: Figure 5. P for education and gender in the United States, 1952-2020

twoway (scatter femalehigh2_dif100 VCF0004 if vote==1, ms(0) msize(tiny) mcolor(blue)) (qfitci femalehigh2_dif100 VCF0004 if vote==1 , clcolor(blue%90) clwidth(medthick) ciplot(rline) blwidth(vthin) blpattern(dash) lcolor(blue%50)) (scatter femalehigh2_dif100 VCF0004 if vote==2, msize(tiny) mcolor(red)) (qfitci femalehigh2_dif100 VCF0004 if vote==2, clcolor(red%90) clwidth(medthick) ciplot(rline) blwidth(vthin) blpattern(dash) lcolor(red%50)) (scatter femalehigh2_dif100 VCF0004 if vote==0, msize(tiny) mcolor(gs8)) (qfitci femalehigh2_dif100 VCF0004 if vote==0, clcolor(gs8%90) clwidth(medthick) ciplot(rline) blwidth(vthin) blpattern(dash) lcolor(gs8%50)), ylabel(-08(02)10, labsize(small)) xlabel(1948(4)2020, labsize(small) angle(forty_five)) yline(0, lcolor(black)) legend(on) clegend(title("High-ed women"))  

*C. Table 6: P and its components for lower educated Whites in 2020

use "anes working dec2024.dta"

*TABLE 6. P and its components for lower-educated whites in 2020

tab lowedwhite2 vote [aweight=weight] if year==2020, row column freq


*TABLE 7. P and its components for college educated women in 2020

tab femalehigh2 vote [aweight=weight] if year==2020, row column freq

**TEXT ON highered women vs. blacks, citydwellers, agnostics
tab black vote [aweight=weight] if year==2020, row column freq
tab urb1 vote [aweight=weight] if year==2020, row column freq
tab agnostic vote [aweight=weight] if year==2020, row column freq

***TEXT ON 2022
use "anes_pilot_2022_stata_20221214.dta"
*downloaded from https://electionstudies.org/data-center/2022-pilot-study/

gen higher=0 if educ<5
replace higher=1 if educ==5| educ==6 // full college+

gen white=1 if rwh==1 & eth!=1
replace white=0 if white==.

gen female=1 if gender==2
replace female=0 if gender==1

gen lowedwhite=1 if white==1 & higher==0
replace lowedwhite=0 if lowedwhite==.

gen femalehigh=1 if female==1 & higher==1
replace femalehigh=0 if femalehigh==.

tab lowedwhite vote [aweight=weight], row column freq

tab femalehigh vote [aweight=weight], row column freq